XML-Grammar-Fortune

Shlomi Fish on 2008-06-12T09:29:26

One of my many computerised passions is to collect quotes in UNIX-like fortune format. Throughout the years, I have formed a moderately large collection of them in several files. As time went on, I noticed a few problems. First of all, they were all in large plaintext files, and pointing someone to a quote involved giving a link to the fortune, and saying "search for Foobar". Moreover, since they were just chunks of text, they couldn't hold any meta-data.

At one time, I heard of someone who created an XML grammar to describe Unix fortunes, but a Google search was no help in finding that. And I also have the grand "Fortunes Mania" vision for a community site that for collecting and sorting quotes. This vision was very intimidating, but recently, I decided to take a small baby step by defining a grammar for fortunes as XMLs. So I present to you the XML-Grammar-Fortune distribution.

I've taken quite a lot of time to think about what I wanted there. One thing I concluded was that there are several different types of fortune cookies: run-of-the-mill quotes, IRC conversations, excerpts from screenplays, structured plaintext, HTML, etc. Therefore, the XML grammar should be able to have several different types of sub-nodes, which each corresponds to a certain class of fortune cookies

Until now I've used DTDs for defining my XML schemas, but for XML-Grammar-Fortune, I decided to learn Relax NG, which I was told was easier than the W3C XML Schemas. I was very impressed from Relax NG - it's easy, it's fun, and it's powerful. One problem I've encountered was that, when validating a document using it, XML::LibXML (version perl-XML-LibXML-1.66-1mdv2008.1), does not give the line number where the validation error has occured. To overcome such problems, one needs to look at the diffs or bisect the document.

Anyway, I defined a Relax NG Schema for the documents, and made sure that some basic examples will validate (test-driven-development-style). Then I worked on an XSLT stylesheet to convert them to XHTML.

When I started, I only had one fortune type - <raw>, which is a gigantic <pre> block with some meta-data. I gradually implemented more fortune types: irc, quote and screenplay, whose RNG and XSLT were based on XML-Grammar-Screenplay, with a lot of ugly copying-and-pasting.

I gradually converted more and more fortunes to have a richer XML semantics. The XML grammar requires an id for each fortune, and also allows specifying a title-element, and some fields in the <info> tag, like "author" or "work". For example all the "Friends" fortunes were converted to XML by first normalising the screenplay and then using a script I wrote to convert them to XML.

So I had all the fortunes as XMLs, but now the plaintext versions went out of sync. So I coded a Perl module to convert them from XML to plaintext.

I should note that due to a problem with XML-LibXSLT and perl-5.10.0, I didn't upload it to CPAN yet, because I do not want to receive so many failure reports.

On a different note: my former co-worker has read "Perl for Perl Newbies" in order to learn Perl, liked it a lot, and told me I should add more to it. That also feels good.